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Abstract 

We analyze the Schelling model of segregation in which a society of n individuals live in a 
ring. Each individual is one of two races and is only satisfied with his location so long as at least 
half his 2w nearest neighbors are of the same race as him. In the dynamics, randomly-chosen 
unhappy individuals successively swap locations. We consider the average size of monochromatic 
neighborhoods in the final stable state. Our analysis is the first rigorous analysis of the Schelling 
dynamics. We note that, in contrast to prior approximate analyses, the final state is nearly 
integrated: the average size of monochromatic neighborhoods is independent of n and polynomial 
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P 1 Introduction 

o 

In 1969, economist Thomas Schelling introduced a landmark model of racial segregation. Elegantly 
simple and easy to simulate, it provided a persuasive explanation of an unintuitive result: that 
local behavior can cause global effects that are undesired by all [45J. In Schelling's model, individ- 

fS| uals of two races, denoted x and o, are placed in proximity to one another, either in a line (the 

one-dimensional model) or in a grid (the two-dimensional model). This represents a mixed-race 

T-i- city, where individuals of different races live in close proximity. Individuals are satisfied if at least 

a fraction r of the other agents in a small local neighborhood around them are of the same type. 
Unhappy agents can move locations, either by inserting themselves into new positions or exchang- 
ing locations with other agents. Schelling showed via small simulations that global segregation 
can occur even when no individual prefers segregation. In his experiments, he found that on av- 
erage, an individual i with r = = ended up in a significantly more segregated neighborhood with 
approximately 80% of i's neighbors with i's type. 

The striking contrast between individual preferences and global effects captured the imagina- 
tions of sociologists, economists and physicists. Schelling's model eloquently argues that while all 
individuals in a community may prefer integration, the global consequence of their actions may 
be complete segregation. Empirical evidence, found through surveys and statistics of segregated 
communities, indicates that the effects of local organization seen in Schelling's spatial proximity 
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model may also lead to real- world segregation, whether by ethnicity [3"1 \TT \ \TE \ |2" H 1^ 1 155 } \13 \ |36 | 157] . 
religion [21] , or socioeconomic factors [7J El [39] . 

However, it is surprisingly difficult to analytically prove or even rigorously define the segregation 
phenomenon observed qualitatively in simulations. Much of this difficulty lies in the fact that the 
dynamics may converge to a variety of states (complete segregation, complete integration, and 
various partially segregated states), and so the underlying Markov chain does not have a well 
defined unique stationary distribution. Prior work |541 1551 156] [57] circumvents this difficulty by 
introducing perturbations in the dynamics, allowing individuals to perform detrimental actions with 
vanishingly small probability. The research then analyzes the degree of segregation in stochastically 
stable states, finding generally that as time approaches infinity, complete segregation is inevitable. 

We instead analyze the one-dimensional segregation dynamics directly, providing the first rig- 
orous analysis of an unperturbed Schelling model. Our model considers a society of n individuals 
arranged in a ring network. We start from a random initial configuration in which each individual 
is assigned type x or o independently and uniformly at random. We parameterize the neighbor- 
hood size in the Schelling dynamics by w and then consider the model of dynamics under which 
random pairs of unhappy individuals of opposite types trade places in each time step. In the initial 
configuration, the average size of a monochromatic neighborhood is constant. We show that with 
high probability, the dynamics converge to a stable configuration. In stark contrast to previous 
analyses of approximate dynamics, we prove that once the dynamics converge, most individuals 
reside in nearly integrated neighborhoods; that is, the average run length in the final configuration 
is independent of n and only polynomial in w. Thus, contrary to the established intuition, the 
local dynamics of Schelling's model do not induce global segregation in proportion to the size of the 
society but rather induce only a small degree of local segregation. This does not contradict empir- 
ical studies, which are often performed over small local populations, but indicates that the results 
found in smaller communities do not necessarily generalize to larger populations. Our results are 
in accord with empirical studies of residential segregation in large populations [Ml [251 ED] which 
consistently find amounts of spatial autocorrelation that are intermediate between purely random 
and completely segregated configurations. 

Our techniques As is common in the analysis of non-stationary random processes on graphs and 
other discrete structures, our process can be analyzed by defining state variables whose expected 
values at any time can easily be estimated based on their values in the preceding time step. The 
time-evolution of these state variables, after a suitable renormalization, can be approximated with a 
continuous-time vector valued process that is not random at all, but satisfies a differential equation 
obtained from the leading-order terms in the aforementioned relations between successive time steps. 
Wormald [51] [52] supplied a general theorem that rigorously justifies the use of such differential 
equation approximations in a wide range of cases. In the context of Schelling segregation, the 
natural parametrization of the state is a vector with infinitely many components, reflecting the 
normalized frequency of each finite string of x's and o's as one looks at all labeled subintervals of 
the ring. There are two reasons why it is not straightforward to apply Wormald's technique in this 
setting. First, the system of differential equations has infinitely many variables and infinitely long 
dependency chains; Wormald's technique only applies when there is no infinite sequence of distinct 
variables yi,y2, ■ ■ ■ such that the derivative of yi depends on the value of yi + \ for all i. Second, the 
continuous-time system is much too complex to analyze its solution directly, so it is unclear how 
to extract any meaningful bounds on the eventual distribution of run-lengths by working directly 



with the differential equation. 

We circumvent the first difficulty by partitioning the ring into bounded-sized pieces using small 
separators, and analyzing a different random process taking place on the partitioned graph. A 
coupling argument shows that after running the bounded-size process for 0{n) steps, with high 
probability its state vector closely approximates the relevant components of the original state 
vector. The new process has only finitely many state variables, so Wormald's technique is applicable. 
We believe that this technique of partitioning along small separators to remove weak long-range 
dependencies may be of use in other applications of Wormald's technique that suffer from infinitely 
long dependency chains. 

To overcome the second difficulty, rather than analyzing the differential equation solution di- 
rectly, we focus on the presence of a particular configuration that we call a firewall: a string of 
w + 1 consecutive individuals of the same type. These configurations are stable for the segregation 
process: the neighborhood of each element in a firewall contains at least w elements of its own 
type, so for r = ~> once a firewall is formed, it cannot be subsequently broken. Thus, to prove 
that a typical site never belongs to a monochromatic run of superpolynomial length, it suffices to 
show that firewalls of both colors form within a distance of poly(u>) on both sides of a given site, 
with high probability. We prove this by defining a special type of configuration called a firewall 
incubator, which occurs with probability 0(1) at any given site in the initial configuration, and has 
probability 1/ poly(u>) of developing into a firewall wherever it occurs in the initial configuration. 
The proof of the latter fact depends on a symmetry property of the differential equation: it is 
invariant under the Z/(2) action that exchanges x's and o's, and therefore the fixed points of this 
Z/(2) action are an invariant set for the differential equation. This symmetry allows us to reduce 
our problem to the analysis of a simpler process in which every step consists of selecting a single 
random site and changing its color if it is unhappy. 

1.1 Related Work 

Schelling's proximity model of segregation, first introduced in 1969 [35], inspired significant research 
into understanding the dynamics of prejudice and self- isolation of communities. Schelling defined a 
one-dimensional proximity model in which a community is represented by individuals placed next 
to one another in a line. Each individual's neighborhood is composed of the ^-closest elements 
on either side. In Schelling's simulations, he let w = 4, so each element's neighborhood contained 
8 elements: the 4 nearest elements on each side. The satisfaction of the individuals in the line 
was parametrized by a single global tolerance value, r. For any individual i, if less than 2wt of 
i's neighbors are of z's type, then i was considered unhappy. Unhappy individuals were chosen 
one at a time and inserted into the nearest position where their tolerance requirement would be 
satisfied. In small simulations (with N = 60 agents), he found that even when r < =, most agents 
ended up in fully segregated neighborhoods |45j. Schelling continued to build upon this initial 
work, introducing a two-dimensional model where agents were placed on a partially empty grid. 
Unhappy agents could move to empty positions on the grid where they would become happier. 
Again, via small simulations, Schelling concluded that even with r < ^, segregation was essentially 
inevitable |46l I47j . Significant research has been done to extend the Schelling spatial proximity 
model [61 d El [201 [301 [321 [331 SS] and analyze it P [201 [32l S21 ESI [561 E3 El] - 

Young was the first to present a more rigorous analysis of the model |54j . He utilized the 
technique of stochastic stability, developed in evolutionary game theory, to analyze a variant of 
the one-dimensional Schelling spatial proximity model. In Young's version of the model, agents 



are placed on a ring and have neighborhood width w = 1 and tolerance r = \. Given values 
< a < b < c and < e < 1, at each time step, two agents are chosen at random and trade places 
with probability 1 if the trade makes both happy, e a if one changes from unhappy to happy and the 
other vice- versa, e if both are initially happy and one becomes unhappy, and e c if both change from 
happy to unhappy. Young utilizes the concept of stochastically stable states in a coordination game 
to analyze the Markov chain and its possible perturbations, concluding that with high probability 
as e — > 0, total segregation will result. Further generalizations and variants of this model have been 
rigorously analyzed by Zhang [551 ESI E7J , Barde [6], DalPAsta et al. [20], Panes and Vriend [12], 
Grauwin [32J, and others. 

Our model differs from previous research and returns to a model closer to Schelling's original 
in that agents do not take utility-decreasing moves and are not analyzed in terms of bounded 
neighborhoods. We find that these simple differences lead to substantially altered results. 

Stochastic models of residential segregation fit within the broader scope of social science mod- 
els that study the aggregate behavior of large networks of agents each individually executing very 
simple, myopic, often randomized procedures to update their behavior in response to the behav- 
ior of their neighbors. Highlights of this line of work include evolutionary game theory and the 
study of stochastically stable states [23 EH 155] , the analysis of coordination games played on 
networks [221 HI], analysis of repeated best-response dynamics in large games [HI [31], and re- 
search drawing explicit parallels between statistical mechanics models and the dynamics of large- 
population games [10] . 

Finally, our paper belongs to a long line of papers in theoretical computer science and discrete 
probability that apply differential equations to analyze the dynamics of non-stationary random 
processes. An early application of this technique is Karp and Sipser's [35] analysis of a random 
greedy matching algorithm in random graphs. Differential equations have also been applied to 
analyze algorithms for random fc-SAT instances [H fT5| [T9l 129] . study component sizes in random 
graphs [H HO] , and to analyze "Achlioptas processes" in which edges are added to an initially empty 
graph by an algorithm selecting among a bounded number of random choices [21 IT2"1 [271 |4"4"1 149] . 
Wormald |51[ [S"2"] provides very general conditions under which differential equation approximations 
such as these are guaranteed to have o(n) additive error in the large-n limit. 

2 Preliminaries 

We consider a society of n individuals. An individual's type is either x or o, with a probability p of 
being x and (1 — p) of being o. Here we take p = 1/2. Individuals live in a ring network represented 
by an n-node cycle. At each point in time, there is a bijective mapping between individuals and 
nodes. Each individual lives in a neighborhood, defined to be his 2u> + l nearest neighbors (including 
himself) for a parameter w <C n, i.e., the neighborhood of an individual at node i in the ring consists 
of the set of individuals at nodes {{i — w) mod n,...,(i + w) mod n}. The parameter w is called 
the window size. 

We say an individual is happy if at least a r fraction of his neighbors are of the same type as 
him. The parameter r is called the tolerance parameter and here is assumed to be 1/2. At any 
given time step, two individuals are chosen uniformly at random and swap nodes according to the 
following rules. If both individuals are unhappy, are of opposite types, and would therefore be 
happjjjin the other's node, then they swap nodes. 



1 Here we are applying the assumption that r = 1/2. When r > 1/2 an unhappy individual may remain unhappy 



A block is a sequence of adjacent sites. A run is a block whose nodes are identically labeled. 
(Throughout the paper, we use the term label to denote the type of the individual living at a given 
node.) 

A key observation underlying our analysis of the Schelling process is that individuals in large 
enough runs never have an incentive to move. Define a firewall as a run of length at least w + 1; a 
firewall is either an x-firewall or an o-firewall depending on the labels of its nodes. For a segregation 
process with tolerance r = ^, since individuals living in a firewall have at least w adjacent neighbors 
of the same type, all elements in the firewall are happy and will remain so, independent of the labels 
of the nodes around the firewall. Therefore, once a firewall is created, no individual in the firewall 
will move and the configuration will remain stable for the remainder of the process. 

In addition, the existence of at least one firewall in the initial configuration also guarantees that 
the process will eventually reach a frozen configuration in which no further swaps are possible. 

Proposition 2.1. Consider the segregation process with window size w on a ring network of size n. 
For any fixed w, as n — > oo ; the probability that the process eventually reaches a frozen configuration 
converges to 1. 

Proof. A potential function that verifies this fact is the number of individuals belonging to firewalls. 
Let So(t) denote the set of individuals belonging to firewalls at time t, and let S\(t) denote its 
complement. As long as So(t) and S\(t) are both nonempty, there will be an individual a G Si(t) 
neighboring an individual b G Sa(t). These two must be oppositely labeled, as otherwise a would 
belong to the same firewall as b. Individual a must be unhappy: assuming w.l.o.g. that b lives to 
the right of a and that the label of a is x, then all of a's neighbors on the right are labeled o, and at 
least one of a's neighbors on the left is labeled o (as otherwise a itself would belong to an x-firewall), 
and hence a is unhappy. If there are unhappy individuals of both labels, then there is a positive 
probability that a will swap with an oppositely labeled unhappy individual and that individual will 
then join a firewall, increasing the potential function. If all of the unhappy individuals are labeled 
x, then all of the o-type individuals are already living in firewalls and the configuration is already 
frozen. 

Thus, we have defined an integer-value potential function, taking values between and n, that 
always has positive probability of increasing unless either the configuration is already frozen, or the 
initial configuration contained no firewalls. Finally, the probability that the initial configuration 
contains no firewalls is certainly o(l): partition the ring into n/(w + 1) blocks, each of which 
independently has probability 2~ w of being a firewall in the initial configuration. The probability 
that none of them are firewalls is (1 — 2~ w )~ n '( w+1 i , which is o(l) as n — > oo. □ 

Firewalls are therefore stable, segregated configurations which guarantee the eventual termina- 
tion of the process. 

3 Analysis 

In this section we prove bounds on the run-length distribution of the segregation model. The 
main idea behind our analysis is to show that firewalls occur fairly frequently. We will show that 



after swapping with an oppositely colored unhappy individual. In such a case, a natural modeling assumption is 
that the swap still takes place as long as each individual has at least as many neighbors of the same type at his new 
location as at his former location. 



for any site on the ring, with high probability, the process will eventually form firewalls of both 
colors on both sides of the site, within its poly(iu) nearest neighbors. To do so, we define a type of 
configuration that we call a firewall incubator and we show that it occurs reasonably frequently in a 
random 2-coloring of the ring: every site has probability f2(l) of belonging to a firewall incubator in 
the initial configuration. The main part of the proof is devoted to showing that firewall incubators 
are reasonably likely to develop into firewalls; the probability is at least l/poly(u;). To prove this, 
it is easier to analyze a related stochastic process in which one step consists of choosing a single 
site and reversing its color if it is unhappy. The comparison with this process is justified if the 
ratio of unhappy x's to unhappy o's is equal to 1 ± o(l) throughout the time interval of interest; 
we prove that this is so, with high probability, by an application of Wormald's differential equation 
technique. 

3.1 Bounding average run length 

As discussed above, to analyze the formation of firewalls we will first define a structure called a 
firewall incubator that has a reasonable probability of becoming a firewall in the long run. An 
incubator is a region with substantially more x sites than o sites (or vice versa). In such regions, 
the minority individuals are unhappy and will continue to move out unless nearby neighborhoods 
have developed an opposite bias. Using a random walk analysis, we will argue that it is reasonably 
likely that all the minority individuals move out before this happens, and so the region turns into 
a firewall. 

The Birth of an Incubator A firewall incubator, defined formally in Definition[T]below, consists 
of a sequence of blocks: two defender blocks flanking an internal block. The blocks of size w on 
either side of the firewall incubator are called attacker blocks, and they play a key role in our 
analysis. Defender and internal blocks are the regions that will potentially become firewalls. The 
attacker blocks are the nearby neighborhoods that might impede the process of the defenders 
becoming firewalls. The internal blocks are also biased in the same direction as the defender blocks 
and so help guarantee that the defender is not attacked from both sides. 

To specify our exact requirements on the biases, we associate a sign, +1 or —1, with x and 
o, respectively, and define the x-bias (3 t (i) of a node i at time t to be the sum of the signs of the 
w-closest nodes on either side of the element and the sign of the element itself. We will write j3{i) 
when the time is clear from context. The x-bias directly expresses the element's happiness: an 
x-element i is happy if and only if f3(i) > and an o-element j is happy and only if /3(j) < Or] 

Definition 1. A firewall incubator is a block F made up of three consecutive blocks Dl,I,Dr 
(called left defender, internal, and right defender, respectively) such that: 

1. Di and Dr have exactly w + 1 nodes; 

2. /3 (i) > y/w for all i G F; 

3. The minimum x-bias in Dl occurs at its left endpoint, and the minimum x-bias in Dr occurs 
at its right endpoints. 

The blocks of length w immediately to the left and right of F are denoted by Al,Ar and are called 
the left and right attackers. 

For an example of an incubator, see Figure [TJ Firewall incubators occur fairly frequently in 



"Since (3(i) is the sum of 2w + 1 labels, it is always odd, and so one of these strict inequalities must hold. 
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Figure 1: A firewall incubator, surrounded by left and right attacking blocks, 
x-bias of a position. 
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the initial configuration. In fact, Proposition 3.1 below proves that every block of size 6u> in the 
initial configuration has probability 0(1) of containing a firewall incubator. The proof consists of 
partitioning the block (along with its neighboring attacker blocks) into sub-blocks of length w and 
showing that the property Vz /3o(i) > y/w is implied by conditions on the partial sums of the sign 
sequence in each sub-block. These conditions are verified to hold with constant probability, using 
the reflection principle and the central limit theorem. Finally, the proof shows that by choosing 
the left and right endpoints of the incubator to be the nodes with minimum bias in the leftmost 
(resp. rightmost) sub-block, with constant probability these nodes also have the minimum bias in 
Dl,Dr, respectively. 

Proposition 3.1. Let r be an integer such that 6 < r < — — 2. For any sequence of rw consecutive 
nodes, the probability that a uniformly random {x,o} -labeling of the nodes contains an x-firewall 
incubator that starts among the leftmost w nodes and ends among the rightmost w nodes is at least 
c r , where c > is a constant independent ofr,w,n. 

To prove the theorem, we start by defining some notation. As above, we associate a value of +1 
to a node labeled with x and —1 to a node labeled with o. If B is any sequence in {x, o} k , we use 
Xj{B) to denote the sum of the first j associated signs, for < j < k. We also use x(B) = Xk{B) 
to denote the sum of all signs in B, and X-j(B) = x(-B) — Xk-j(B) to denote the sum of the final 
j signs. 



Definition 2. A sequence B £ {x,o} w is x-promoting if x(B) > 5y/w and for all j = l,...,w, 
Xj(B) > -2-s/w and X-j( B ) > -%Vw- 



Lemma 3.2. The probability that a uniformly random sequence B G {x,o} w is x-promoting is 
0(1). 

Proof. For k > 0, let £^,£j? denote the events 

£t = {3j x,(B) < -k} 
e? = {3j X - j (B)<-k} 

The probabilities of these events can be calculated using the reflection principle [26J , which can be 
phrased in this context as follows: if B is a uniformly random element of{x,o} w then for all k > 



Pr(4 L ) = Pr( X (B) < 



-k) + Pi(x(B) < -k). 



By symmetry, the right side is equal to Pr(x(-B) < ~k) + Pr(x(B) > k). Using Markov's inequality, 
and the fact that x{B) is a sum of w independent random signs and hence E[(x(-B)) 2 ] = w, we 
now obtain 

Pt(£t) < Pr(( X (B)f > k 2 ) < jj£, 



for all k > 0. In particular, the right side is 1/4 when k = 2y/w. 

It is easy to see that whenever a < b are two numbers such that the events x{B) = a and 
x(B) = b have positive probability, the inequality 

Pv(£t | X (B) = a)> Pr(£t | X (B) = b) (1) 

holds. One way to see this is to observe that we can obtain a random sample from the conditional 
distribution of B given x{B) = a by the following procedure: first draw a random sample from the 
conditional distribution of B given x(-B) = b, then select a uniformly random set of -^ occurrences 
of x in the sequence and change each of them to o. The second stage of the sampling does not 
increase any of the partial sums Xj(B), so if Sh held in the first stage of the sampling, then it 
continues to hold after the second stage. Inequality (fij) implies that for any b such that the events 
x(B) < b, x(B) > b both have positive probability, 

Pr(£ fe L | X (B) <b)> Pr(£t \ x(B) > b). (2) 

Since the unconditional probability of ££ is a weighted average of the left and right sides, we obtain 

Pr(^| X (i?)>6)<Pr(£: fc L )<^, (3) 

for all b, k. By symmetry, 

Pi(£K\x(.B)>b) = Pr(£t\x(B)>b). 
By the Central Limit Theorem, 

lim Pr(x(5) > 5Vw) = —= / e~ x2/2 dx 
w ^°° V27T h 

and therefore there is an absolute constant cq such that Px(x{B) > 5y/w) > 2co for all w > 25. 
(The restriction to w > 25 is necessary so that w > 5y/w.) Now we find that for b = b^/w and 
k = 2-y/w, 

Pi{B is x- promoting) = Pr (x(B) >b A £fc A~£j? 

> Pr(x(S) > 6) • [1 - 2Pr(f fe L | X (B) > b)} 
>2co\l-2(£)] =c . 

D 
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Now we proceed to the proof of Proposition 3.1 



Proof of Proposition 3. 1 . Given a sequence of rw consecutive nodes, for 6 < r < — — 2, partition it 
into r blocks B\, E>2, . . . ,B r each containing w consecutive nodes. Let Bq, -B r +i denote the blocks 
of w nodes immediately preceding B\ and immediately following B r , respectively. If A = Aoo is any 
labeling of the nodes in Bq, B\, . . . ,B r +\, then let Aoi,Aio,An respectively denote the labelings 
obtained from A by reversing the ordering of the labels of the first Aw nodes, the last Aw nodes, or 
both sets of nodes. 

With probability at least Cq + , the r + 2 blocks that constitute Aoo are all x-promoting. The 
reverse of an x-promoting sequence is also x-promoting, so when this event happens it also happens 
that Aoi, Aio, An are also made up entirely of x-promoting blocks. Furthermore, by symmetry, all 
labelings in the set {Aoo, Aoi, Aio, An} are equiprobable given this event. If we can show that at 
least one of these four labelings has an x-firewall incubator that starts in B\ and ends in B r then we 
will have shown that the probability of such an incubator existing is at least jCq + , thus establishing 
the lemma. 

Recall the bias of a node, f3(i), defined as the sum of the 2w + 1 signs associated to the nodes 
within distance w of i, including i itself. Note that if i belongs to the middle block in a sequence of 
three consecutive x-promoting blocks, then /3o(i) > \/w. In fact, letting B, B', B" denote the three 
blocks and letting j denote the position of i within B' , we have 

Po(i) = X(B') + X-U-i)(B) + X j(B") > 5y^ - 2v^ - 2y^ 

by the definition of an x-promoting block. In particular, our assumption that all of the blocks 
constituting the labelings {Aoo, Aoi, Aio, An} are x-promoting implies that every node in the blocks 
Bi,...,B r has bias greater than y/w in all four of the labelings. 

When i belongs to B\ or B2, all of the nodes within distance w of i belong to BqU BiU B2U B%. 
Thus, when we reverse the ordering of labels of those Aw nodes, the resulting sequence of biases in 
B1UB2 is also reversed. In particular, we can ensure that the set M\ = argmin{/3o(i) | i E B1UB2} 
intersects B\ by retaining the labeling of Bq, . . . ,B$ as in Aoo or taking the reverse ordering. 
Similarly, we can ensure that the set M r = argmin{/3o(i) | i £ -E> r -i U B r } intersects B r by 
retaining the labeling of -B r -2, ■ ■ ■ > B r+ \ or taking the reverse ordering. It follows that in at least 
one of the labelings {Aoo, Aoi, Aio, An}, the sets M\ n B\ and M r n B r are both nonempty. When 
this happens, by construction, any sequence of nodes starting in M\ n B\ and ending in M r n B r 
is an x-firewall incubator. □ 

The Lifecycle of an Incubator We focus on the interaction between the attacker and defender 
blocks on one side of a firewall incubator. At every point in time, some swap of two nodes is 
proposed. This swap contributes to the construction of a firewall in the defender if it involves an 
o moving out of the defender, and hinders the construction if it involves an x moving out of the 
attacker. We need to show that good moves (o's moving from the defender) happen sufficiently 
frequently and early in the process. For the remainder of this section, we focus on moves in left 
attacker/defenders; similar statements hold for right attacker/defenders. 

More formally, we introduce a notion called satisfaction time which indicates the first time at 
which an element is selected for a move. 

Definition 3. The satisfaction time of a node i, denoted by t*, is defined to be the first time when 
i is selected to participate in a proposed swap with an unhappy, oppositely labeled individual. (If no 
such time exists, then t* = 00.) A node i is called impatient at time t if it is unhappy and t < t*. 



Note that the element at i may not actually participate in a swap at time t*, since it will only 
participate in a swap if it is unhappy. In particular, an attacking x-element i £ Ai may swap at its 
satisfaction time, swap at a later point, or not swap at any point. However, a defending o-element 
i £ Dl is guaranteed to swap at its satisfaction time if its bias and those of all its neighbors in the 
defender and internal blocks remain positive up until and including its own satisfaction time. The 
initial bias of any such i is, by the definition of an incubator, at least \/w, and so the guarantee holds 
so long as enough attacking x-elements remain when the o-element's satisfaction time is reached. 
To capture this intuition mathematically, we make the following definitions. 

Definition 4. For a firewall incubator F = Dl U / U Dr with corresponding attackers Al,Ar, 
a left attacking x is an individual of type x who belongs to Al in the initial configuration, and a 
left defending o is an individual of type o who belongs to Dl in the initial configuration. A left 
combatant is an individual that is either a left attacking x or a left defending o. The equivalent terms 
with "right" in place of "left" are defined similarly; henceforth when referring to combatants we will 
omit "left" and "right" when they can be inferred from context. The number of left attacking x's 
and left defending o 's are denoted by ai,di, and for the right combatants we define an, dn similarly. 

Definition 5. The left-transcript (resp. right-transcript,) is the sign sequence obtained by listing 
all of the left (resp. right) combatants in reverse order of satisfaction time, and translating each 
attacking x in this list to +1 and each defending o to — 1. 

If there exists a time to o,t which no individuals in F are impatient, any sign sequence obtained 
from the left-transcript (resp. right-transcript) by permuting the signs associated to individuals 
whose satisfaction time is after to, while fixing all other signs in the transcript, is called a left- 
pseudo-transcript (resp. right-pseudo-transcript ). 

The relevance of pseudo-transcripts will only become clear much later, when we prove Proposi- 
tion |3.4| They constitute a relaxation of transcripts that encode almost all of the relevant informa- 
tion in the transcripts — since they differ from transcripts only by a permutation that is, in some 
sense, irrelevant — yet they turn out to be easier to work with probabilistically. 

Define the k th partial sum of a sequence to be the sum of its first k elements. Our result follows 
from the following two main propositions. 

Proposition 3.3. Suppose that F is a firewall incubator and there exist left- and right-pseudo- 
transcripts such that all partial sums of both pseudo-transcripts are non-negative. Then F becomes 
an x -firewall. 

Proof. The proof is by contradiction. Let to denote the earliest time at which no individual in F 
is impatient; such a to exists by our hypothesis that left- and right-pseudo-transcripts exist. If F 
is not an x- firewall at time to, then some node j E F contains an individual of type o at that time. 
There must exist a time t\ < to at which j is occupied by a happy individual of type o. The proof 
is by case analysis: j is not impatient at time to, so either it is happy or t* < to. If j is happy, set 
ti = to- Otherwise, if the type-o individual occupying node j at time to has never moved, then set 
ti = t*; the type-o individual occupying j must have been happy at time t\ or else he would have 
moved at that time. Finally, if the type-o individual occupying node j at time to is not the original 
occupant, then let t\ denote the time immediately after he moved to location j; by the definition 
of the swap rule, this means j was occupied by a happy individual of type o at time t\. 

Consider the first node in F to develop a negative x-bias, and let t denote the time when this 
happens. Note that t < t\ since node j must have negative x-bias at time t\ in order for the 
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occupying type-o node to be happy. Up until time t, the biases of all nodes in F are positive, and 
so the only swaps in F involve o-elements moving out. Such swaps can not decrease the x-bias of 
nearby nodes, and since the x-bias of nodes in I is completely determined by the labels of nodes 
in Dl U I U Dji = F, we conclude that the first node to develop a negative x-bias is not in /, but 
rather must be in Dl or Dr. Without loss of generality, suppose that it is node i S Dl- From 
the definition of a firewall incubator, the initial x-bias of i, @o(i), was bounded below by the x-bias 
of the leftmost node in Dl- The set of neighbors of the leftmost node in Dl (including the node 
itself) is Al U Dl, so the x-bias of the leftmost node can be expressed in terms of the number of 
attacking x's and defending o's, at and d,L, by the formula 

a L - (w - a L ) + (w + 1 - d L ) - d L = 2(a L - d L ) + 1. 

Hence, Po(i) > 2(ol — d^) + 1. Up until time t, swaps involving elements in I do not decrease 
the x-bias of i; we will ignore such swaps in the remainder of the proof. In Dl, a swap happens 
before time t if and only if it involves an o-element. In particular, up until time t, whenever the 
satisfaction time of an o-element in Dl is reached, it swaps out, increasing the x-bias of i by 2. In 
the Al block, whenever an x-element swaps out, it decreases the x-bias of i by 2. This happens in 
one of three ways: 

1. An attacking x (i.e. an x-element that was present in the initial state) swaps out at its 
satisfaction time (and before t). This contributes —2 to the x-bias of node i. 

2. An attacking x swaps out after its satisfaction time (but still before t). Again this contributes 
a —2 to the x-bias of node i. 

3. An x-element swaps into the attacker, becomes unhappy, and later swaps out (all before time 
t). In this case, the element contributes +2 to the x-bias of i when it swaps in and —2 when 
it swaps out, so the total contribution at time t is 0. 

Let a L be the number of attacking x's in Al whose satisfaction time is before t. The above shows 
that the decrement to the x-bias of i due to swaps of elements in Al is at most 2a , L . Similarly 
define d L to be the number of defending o's in Dl whose satisfaction time is before t. Then we 
have that the x-bias of i at time t satisfies: 

l3t(i)>f3o(i) + 2d t L-2a t L 

> 2(a L -d L ) + l + 2d L - 2a L 

> 2-[{aL-a t L)-{dL-d t L )). (4) 

Recall that aL is the number of attacking x's and similarly dL is the number of defending o's. Thus 
a L ~ a l is the number of attacking x's whose satisfaction time is greater than t, and similarly 
dL — d L is the number of defending o's whose satisfaction time is greater than t. Thus, the right 
side of Q is twice the k th partial sum of the left-transcript, where k = (aL — a L ) + (di — d L ) 
denotes the number of individuals whose satisfaction time is after t. As t is earlier than to, the 
earliest time at which no individuals in F are impatient, any left-pseudo-transcript differs from the 
left-transcript only by permuting a subset of the first k signs, and therefore has the same k partial 
sum. Now our assumption that the x-bias of i becomes negative at t contradicts the hypothesis 
that there exists a left-pseudo-transcript whose partial sums are all non- negative. □ 
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The proof of the next Proposition occurs in Appendix |A,1[ 

Proposition 3.4. If B is a random block of length 6w, then with probability 0,(1 /w), B contains a 
firewall incubator having left- and right-pseudo-transcripts whose partial sums are all non-negative. 



The following gives the intuition behind the proof. First, we know from Proposition 3.1 that 
with constant probability, B contains a firewall incubator F. Let us focus on the left-transcript of 
F. First assume (unjustifiably) that the transcript is a uniformly random permutation of the a^ 
+l's and di — l's. Then, by the following simple probabilistic lemma known as the Ballot Theorem, 
its partial sums are all non-negative with probability (ol — ^l)/(«l + d£) which, by the definition 
of a firewall incubator, is at least 0(y/l/w). 

Lemma 3.5 (Ballot Theorem). Consider a multiset of consisting of a copies of +1 and b copies 
of —1, and let X\,X2, ■ ■ ■ , £ a +b be a uniformly random ordering of the elements of this multiset. 
The probability that all partial sums x\ + ■ ■ ■ + Xj (1 < j < a + b) are strictly positive is equal to 
max{0,^}. 

The theorem was first proved in 1887 jSHUHH]. One elegant proof, originally due to Dvoretzky 



and Motzkin [22], is presented in Appendix A. 2 



Unfortunately, the transcript is not a uniformly random permutation. A bias arises since the 
number of unhappy elements of each type is not precisely equal. If at some point there are more 
unhappy o's, say, than x's, then the satisfaction time of an attacking x is more likely to happen 



earlier. In Section 3.2 , we will show that the number of unhappy elements is approximately balanced 
for a sufficiently long time. Now at any point in time, we artificially correct the small imbalance 
as follows: suppose there are m extra unhappy elements of one type, say x. Then choose m 
unhappy x-elements at random and call them censored. Call a swap censored if it involves a 
censored element. (There is actually one more technicality here: we also want to censor swaps 
if both elements are combatants of F. This necessitates a subtle modification to the censorship 
construction, with no significant quantitative consequences for the proof.) Since swaps are between 
random unhappy elements, as long as the imbalance is small, the probability that a swap is censored 
is also small. Conditioning on having no censored swaps, the transcript is indeed a uniformly 
random permutation. There is a stopping time to a t which the imbalance ceases to be small, and 
we cannot guarantee that censored swaps are unlikely after ioi however, we can show that with 
probability 1 — o(l), no individual in F is impatient at time to- Consequently, we can obtain a 
pseudo-transcript by randomly permuting the combatants whose satisfaction times are after to, and 
provided that no censored swaps occurred before to, the pseudo-transcript is a uniformly random 



permutation. Then, Proposition 3.4 follows from Lemma 3.5 



Our main theorem follows fairly directly from Proposition 3.3 and Proposition 3.4 



Theorem 1. Consider the segregation process with window size w on a ring network of size n, 
starting from a uniformly random initial configuration. There exists a constant c < 1 and a function 
uq : N — > N such that for all w and all n > uq(w), with probability 1 — o(l), the process reaches 
a configuration after finitely many steps in which no further swaps are possible. The average run 
length in this final configuration is 0(w 2 ). In fact, the distribution of runlengths in the final 
configuration is such that for all A > 0, the probability of a randomly selected node belonging to a 
run of length greater than Xw 2 is bounded above by c A . 
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Proof. By Proposition 2.1 with high probability, the process reaches a frozen configuration in which 
no further swaps are possible. To bound the distribution of runlengths in the frozen configuration, 
consider a randomly sampled site, once again denoted by a. As we scan clockwise from a in the 
initial configuration, let us divide the ring into disjoint blocks of length Qw. Each of these blocks has 
probability Q(l/w) of containing an x-firewall incubator having left- and right-pseudo-transcripts 



whose partial sums are all non-negative. (Proposition 3.4.) Of course, similar statements hold with 
o in place of x. Thus, for a suitable constant c < 1, the probability that none of the first Au>/6 
blocks encountered on a clockwise scan of length-(6u>) blocks starting from a contain x-firewalls 
in the final frozen configuration is bounded above by c . The same conclusion holds with o in 
place of x and with counterclockwise in place of clockwise, by symmetry. Node a cannot belong to 
a monochromatic run of length greater than Xw 2 assuming that it has individuals of both labels 
within this radius on both sides of itself, and this completes the proof. □ 

3.2 Bounding number of unhappy elements 

This section sketches a proof that the numbers of unhappy x's and o's remain nearly balanced until 
late in the segregation process. More precisely, define a stopping time To to be the earliest time when 
fewer than 3n/w 2 individuals are impatient. Theorem k2| below asserts that when the ring size n is 
sufficiently large, it holds with high probability that at all times t <Tq the numbers of unhappy x's 
and o's differ by at most n/w 4 . The full proof of the theorem is given in Appendix A. 3 The theorem 



statement is plausible because the entire stochastic process is symmetric under interchanging the 
roles of x and o. Thus, one would expect nearly equal numbers of unhappy x's and o's in the 
initial configuration, and one would expect this near-balance to persist for many steps after the 
initialization of the stochastic process, since the process itself has no bias in favor of reducing the 
number of unhappy x's more rapidly than the number of unhappy o's or vice-versa. 

We can express the segregation process as a continuous-time process (with state changes only 
at times that are multiples of 1/n) whose state variables encode, for every k and every sequence 
a = (<to, • • • , (Tfc) £ {x, o} k , the fraction of sites such that a describes the labeling of that site and its 
k nearest clockwise neighbors. Any other parameter depending only on the frequency of occurrence 
of certain bounded-size configurations (e.g. the fraction of unhappy x's and o's) can be expressed 
as a function of these state variables. In the continuum limit, the state variables are not random at 
all; they evolve deterministically according to a system of differential equations. The operation of 
interchanging the symbols x and o defines a permutation of the set of state vectors, and the fixed- 
point set of this permutation is an invariant set for the differential equation because the derivative 
at any such point must also (by symmetry) be preserved under the operation of interchanging x 
and o and therefore the differential equation solution can never exit the set of vectors preserved by 
this operation. In the continuum-limit process the initial state belongs to this fixed-point set and 
therefore the state vector at all future times remains invariant under interchanging x and o, which 
gives a heuristic justification of the fact that the numbers of unhappy x's and o's remain nearly 
equal throughout the segregation process, or until the process becomes so close to "frozen" that 
the continuum- limit approximation no longer applies. 

Wormald's technique [52J provides a mathematically rigorous method for justifying these differential- 
equation approximations and quantifying the approximation error. Using this technique, we derive 
the associated differential equation as a limit of discrete-time difference equations and show that 
it is quadratic with coefficients independent of n. However, our problem resists a straightforward 
application of Wormald's technique because it has infinitely many state variables with infinitely 
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long dependency chainsjj 

To circumvent this difficulty, we do not directly analyze the segregation process on a ring of size 
n. Instead, for a function L(w) that grows sufficiently rapidly, we analyze the segregation process 
on a disjoint union of n/L(w) rings each having length L(w)r| We then bound the error resulting 
from approximating a single ring by a collection of bounded-length rings by defining a notion of 
"taint" that allows us to couple the single-ring and bounded-length ring versions of the process. 
Taint is defined in such a way that for any untainted node, the node and its neighbors have the 
same label in both versions of the process. We then bound the number of untainted sites with 
high probability using martingale techniques. Combining these arguments, we obtain the following 
result: 

Theorem 2. Consider the segregation process on a ring of length n with window size w. For all w 
and all sufficiently large n (i.e. all n > no(w), for some function no), with probability greater than 
1 — — , the number of unhappy x 's differs from the number of unhappy o 's by at most njw A at every 
time t < To, where Tq denotes the earliest time when fewer than 3n/w 2 individuals are impatient. 



As mentioned earlier, the full proof is given in Appendix A.3| 



4 Open Questions 

In this paper we have shown that the one-dimensional Schelling segregation process with window 
size w leads, with high probability, to a "frozen configuration" in which most nodes belong to 
monochromatic runs of size at least Q(w) and at most 0(w 2 ). We are hopeful that the upper 
bound can be improved to 0(w), using an extension of the techniques introduced in this paper, 
but this strengthening of the result is beyond the scope of the present work. Assuming it is correct 
that most nodes belong to monochromatic runs of size ®(w), it becomes natural to conjecture that 
the distribution of runlengths, normalized by 1/w, converges to a distribution F. In other words, 
if n = n(w) grows sufficiently fast as a function of w, then for all r > 0, as w — > oo the probability 
that a randomly selected node belongs to a run of length less than rw in the frozen configuration 
converges to a limit F(r). Proving such a limit theorem seems beyond the reach of the techniques 
introduced here, to say nothing of characterizing the precise runlength distribution F, if it exists. 

Several parts of our analysis hinged on symmetry arguments that are specific to the case in 
which x and o are equally likely in the initial configuration, and in which the threshold r, defining 
the fraction of neighbors that must be of the same type as an individual in order for that individual 
to be satisfied, is equal to \. It is quite possible that when one varies either of these assumptions, 
the model's behavior is qualitatively different; for example, the lengths of the runs in the frozen 
configuration may become exponential rather than polynomial in w. Understanding how the one- 
dimensional model's behavior varies as we vary the parameter r or the x-to-o ratio are important 
questions for future work. A related open problem is to analyze the one- dimensional segregation 
process when the tolerance threshold r may vary from one individual to another. 

Finally, and most ambitiously, there is the open problem of rigorously analyzing the Schelling 
model in other graph structures including two-dimensional grids. Simulations of the Schelling model 



3 Wormald's technique is applicable to differential equations with state variables Yi,Yi, . . ■ such that the derivative 
of Yi depends only on Y\, . . . ,Y± for all i, but unfortunately our variables do not admit such an ordering. 

4 For simplicity, we assume that n is divisible by L(w). In general, we would have to analyze a disjoint union of 
\n/L(w)\ rings each having length L(w) or L(w) + 1. 
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in two dimensions reveal beautiful and intricate patterns that are not well understood analytically. 
Perturbations of the model have been successfully analyzed using stochastic stability analysis by 
Zhang [55, 56, 57J, but the non-perturbed model has not been rigorously analyzed. Two-dimensional 
lattice models are almost always much more challenging than one-dimensional ones, and we suspect 
that to be the case with Schelling's segregation model. But it is a challenge worth undertaking: if 
one is to use the Schelling model to gain insight into the phenomenon of residential segregation, it 
is vital to understand its behavior on two-dimensional grids since they reflect the structure of so 
many residential neighborhoods in reality. 
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A Deferred Proofs 

In this appendix we present proofs that were deferred from Section [3j 



A.l Proof of Proposition 3.4 



We would like to show that a random block evolves into a firewall. We already know that such 
a block contains a firewall incubator with constant probability, and that an incubator becomes a 
firewall if there are pseudo-transcripts such that all partial sums are non- negative. Thus the crux 
of the argument is to show that such pseudo-transcripts exist with sufficiently high probability. 



This would follow from Lemma 3.5 if the transcripts were random permutations, but that is not 
precisely true. The reason is that the global number of unhappy elements of each type might be 
imbalanced, creating an imbalance in the probability of an o-swap versus an x-swap. We correct 
this imbalance by censoring certain swaps and then conditioning on the event that the transcript 
involves no censored swaps. 

We begin with a definition of censored swaps. Fix a block B and a time t. Let the padded block 
pad(.B) consist of B together with the w nodes on its left and the w nodes on its right. Let n x {t, B) 
be the number of unhappy x-elements outside pad(B) at time t, and define n (t,B) similarly. 
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Suppose that n x (t, B) < n (t, B) (the other case is similar), and let k = n (t, B) — n x (t, B). Then 
the censored pairs involving an o-element in pad(£>) are precisely those involving an x-element in 
pad(B). Let C consist of an arbitrary subset of k unhappy o-elements outside pad(1?). For an 
x-element inside pad(B), the censored pairs are those involving an o-element in pad(B) U C. All 
other pairs are uncensored. A proposed swap is censored if the corresponding pair is censored. 

The following properties of this definition will be useful in our proof. The first two properties 
will help us prove that transcripts of uncensored swaps are uniformly random permutations. The 
third property will be used to show that censored swaps are rare. For the third property to follow, 
we need to argue that there are sufficiently many unhappy elements so that the ratio of censored to 
uncensored swaps is small. To this end, recall from Theorem [2] that up until time To, the combined 
number of unhappy individuals is at least 3n/w 2 . 

Lemma A.l. For any block B and time t, 

1. every pair of elements in pad(B) is censored, 

2. every element in PAT>{B) is in an equal number of uncensored pairs, 

3. and if t < Tq, then with probability 1 — 1/w, every unhappy element in pad(B) has at most 
n/w A + |PAD(i?)| censored and at least \njvJ 1 — n/w A — |PAD(i?)| uncensored partners. 

Proof. The first two properties follow by construction. The third property follows immediately 
from Theorem O □ 

We are now ready to prove Proposition |3.4| which we restate here for convenience. 

Proposition A. 2. If B is a random block of length 6w, then with probability il(l/ui), B contains a 
firewall incubator having left- and right-pseudo-transcripts whose partial sums are all non-negative. 

Proof. Let inc denote the event that the initial labeling of B contains a firewall incubator. We know 



from Proposition 3.1 that Pr(iNC) = £1(1). Let us fix a firewall incubator F in B (if it exists) for 
the remainder of the proof. Consider the transcript r that combines the left- and right-transcripts 
of F, i.e., the sign sequence obtained by listing all of the combatants of F (both right and left) 
in reverse order of satisfaction time and translating each combatant in the list to the appropriate 
sign. Let to be the earliest time, if it exists, at which no individuals in F are impatient. (If no such 
time exists, to = °o.) Define the scrambled pseudo-transcript, S(t), to be a sign sequence obtained 
from r by permuting, uniformly at random, the signs associated to nodes whose satisfaction time is 
after to- Let S(tl), S(tr) be the subsequences of S(t) corresponding to left- and right-combatants. 
Our goal is to show that, with probability Q(l/w), S(tl) and S(tr) are left- and right-pseudo- 
transcripts for F whose partial sums are all non-negative. Denote this event by non-neg; note 
that non-neg is a subset of INC, since otherwise the incubator F and its combined transcript r 
are not even defined. 

We define two more events to be used throughout this proof. Let pad(F) denote the block 
consisting of F and its associated attacker blocks Al,Ar. Recall that to is the earliest time at 
which no individuals in F are impatient, or to = oo if no such time exists. Then the two events of 
interest are: 

• swap, the event that no unhappy element of PAT>{B) participates in a censored swap before 
time to, 
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• time, the event that to < To. (Recall that To is the earliest time when fewer than 2>n/w 2 
individuals are impatient.) 

We would like to prove Pr[NON-NEG] = 0,(1 /w). We do this using the following pair of claims. 

Claim A. 3. Pr[NON-NEG|swAP, inc] = 0.(1 /w). 

Proof. Let i±, . . . ,i£ denote the list of combatants in the order that their signs appear in S(r). 
In other words, the sequence i\, . . . , %i consists of a random permutation of the combatants whose 
satisfaction time is later than to, followed by a listing of the others in decreasing order of satisfaction 
time. We first show that for any initial labeling of pad(T), given event SWAP, the list ii, . . . , ii is a 
uniformly random permutation of the combatants. First note that, by property [T] of Lemma |A.1| 
combatants only swap with non-combatants, so the list is unique (i.e., there are no two combatants 
with equal satisfaction times). To prove that the list is a random permutation, we show that 
for every k, if we condition on the subsequence ik+i, ■ ■ ■ ,U (i n addition to conditioning on event 
swap), then ik is a uniformly random sample from the set Ck of combatants not listed in ik+i, ■ ■ ■ , it- 
This is proven by analyzing two cases. Denote the satisfaction time of ik by t*. If t* > to, then 
by construction ii, ... ,i/~ is a uniformly random permutation of Ck, so ik is a uniformly random 
sample from Ck- On the other hand, if t* < to, then by property [2] of Lemma A.l, each of the 



individuals in {i±, . . . ,ik} is equally likely to participate in an uncensored swap at time t* . 

Note that this means S(tl) and S(tr) are also independent uniformly random permutations 
for any initial configuration of pad(F) given events SWAP, INC. As f3o(i) > \/w for all i G F, and 



T 
for an — dpi). Furthermore, cll + dp < 1w + 1 (and similarly for qr + dg). Thus, applying the 



in particular for the left-most and right-most nodes, we know that cll,— dp > ^V™ ( an d similarly 



Ballot Theorem (Lemma 3.5), we see that the probability S(tl) (similarly S(tr)) has non-negative 
partial sums given events SWAP and INC is 0(1/ y/w). Therefore, by the independence of S(tl) and 
S(tr), Pr[NON-NEG|sWAP,iNC] = 0,(1 /w) as claimed. □ 

Claim A. 4. Pr[swAP,iNC] = 0(1) for w sufficiently large. 

Proof. Recall the event TIME, that to < To. We will establish the stronger claim that Pr[swAP, time, inc] 
0(1) for w sufficiently large. 

We first work on bounding Pr[swAP|TlME, inc]. Given event TIME, we know from Theorem^ 
that there is at most 0(l/w) probability of there being a time t < to at which the numbers of 
unhappy x's and o's differ by more than n/w A . If there is no such time t then property K^ of 
Lemma |A. 1| implies that the probability the proposed swap for a combatant is censored is at most 
0(1/ w 2 ). By the union bound, the probability that any proposed swap in the transcript is censored 
is at most 0(1/ w). Thus, Pr[swAPJTiME,iNC] = 1 — 0(l/w). 

It remains for us to show that Pr[TiME, inc] = 0(1). We will use the equation 

PrfTiME, inc] = PrfiNC] — Pr[iNC A time]. (5) 



The first term on the right side is 0(1), by Proposition 3.1 To bound the second term, recall that 
by definition of To, the total number of impatient individuals in the ring is less than 3n/w 2 . Since 
B is a random block of size 6u>, the expected number of impatient individuals in B at time To is 
less than 18/w, so by Markov's inequality, the (unconditional) probability that our random block B 
contains any impatient individual at time To is at most 18/w. However, if event inc A time occurs, 
it means that F must contain an impatient individual at time To. Thus, Pr[lNC A time] < 18/w. 
Equation [5] now says that Pr[TiME, inc] = 0(1) — 0(1/ w) = 0(1), for large enough w. □ 

20 



6 











1 ' 
1 
1 

ft > ' 








l\ 

1 \ 

1 \ 

1 \ 

1 \ 


1 
1 
1 

1 

I 




/ l 

/ ^ 


< 
i 

i t't(O) 


i 


\ »«2) 

v' 






\ 1 

• t(-l) 






A /t(-2) 








- 


Vt(-3) 











10 15 20 25 30 35 



Figure 2: The partial sums of the extended sequence. 

Noting that Pr[NON-NEG] > Pr[NON-NEG | SWAP, INC] • Pr[swAP,lNC] and applying the preceding 
two claims finishes the proof. □ 



A. 2 Partial Sums of Random Permutations 



This subsection proves Lemma 3.5, the Ballot Theorem. The first proof appeared in 1887, and 
many proofs have been discovered since then. We present one such proof, which is essentially the 
one given by Dvoretzky and Motzkin [22], in order to make our exposition more self-contained. 

Proof. We will prove a stronger assertion: if a fixed sequence yi, 3/2, • • • , Va+b containing a copies of 
+1 and b copies of —1 is permuted by a random cyclic permutation to obtain x\, . . . , x a+ ^, then the 
probability that all partial sums x\ + • • • + Xj (1 < j < a + b) are strictly positive is max{0, frj}- 
This suffices to prove the lemma, since a uniformly random permutation composed with a random 
cyclic permutation is again uniformly random. 

If a < b then the partial sum x\ -\ {- Xa+b is non-positive, which implies that the probability 

is zero, as claimed. Henceforth assume a > b. Extend the sequence x\,X2, ■ ■ ■ ,x a+ b to a period 
infinite sequence xi,x%, ■ ■ ■ with period a + b, and let Sj be the j partial sum of this sequence, 



Yji=i x i- (When j = 0, we define Sj to be zero.) The identity s 



j+a+b 



Sj + a — b implies that 



Sj — > 00 as j — > 00. Hence, for any integer z there are at most finitely many j such that Sj 



z\ 



z in which case t(z) is undefined. 
and Sj + i exceeds Sj by at most 



define t(z) to be the largest such j, unless there is no j with Sj 
We know at least that t(z) is defined for all z > 0, because sq 
1, for all j. 

Figure [2] illustrates the extension of a sequence and shows for several choices of z the corre- 
sponding value t(z). 
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Once again using the identity Sj +a+ b = Sj + a — b, we find that t(z + a — b) = t(z) + a — b for all 
z such that t(z) is defined. Thus, the image of t (i.e., the set of all j such that t(z) = j for some z) 
consists of all the non-negative integers in exactly a — b distinct congruence classes modulo a + b. 
Note that belongs to the image of t if and only if t(0) = 0, and that this happens if and only if 
all of the partial sums Sj (1 < j < a + b) are strictly positive. Thus, we devote the rest of the proof 
to showing that Pr(0 belongs to the image of t) = ^Fj. 

Let us consider how the image oft changes when we cyclically permute the sequence xi, X2, ■ ■ ■ , x a+ b- 
Denote the permuted sequence by x' x , x' 2 , ■ ■ ■ , x' a+b = X2, X3, . . . , x\. As before make Xi,x' 2 , ■ ■ ■ into 
an infinite periodic sequence and denote its partial sums by 

j 3+1 

S 'j = ^2 X 'i = ^2 X i = S i+1 - XI. 
i=l i=2 

Let t'(z) be the largest j such that s'- = z. From the formula s'- = Sj+\ — x\ we immediately see 
that t'(z) = t(z + x\) — 1 for all z such that both sides are defined. Thus, the congruence classes 
in the image of t' are obtained from those in the image of t by subtracting 1 modulo a + b. As 
we run through all of the cyclic permutations of the sequence xi,X2, • • . ,x a+ b, we shift the set of 
congruence classes in the image of t by subtracting each element of Z/(a + b). Since the image of t 
contains exactly a — b congruence classes, the probability that the image contains when we apply 
a random cyclic permutation is exactly ^rj. □ 

A. 3 Applying Wormald's Technique 



This section provides formal proofs of the arguments in Section 3.2 



Tainted nodes and bounded-length ring approximation Assume now that n is divisible 
by L = L(w), and consider a modified segregation process in which the nodes are still numbered 
1,2, ... ,n, but the graph structure G is now a union of disjoint cycles of length L. Specifically, 
nodes i and j are connected if and only if \i/L\ = \_j/L\ and i = j ±1 (mod L). Note that for large 
L, the overwhelming majority of nodes have the same set of neighbors in G as in the n-cycle; the 
exceptions are those i that belong to one of the congruence classes — w + 1, — w + 2, . . . ,w — 2,w — 1 
mod L. 

We will couple the segregation process in G with the segregation process in the n-cycle C n in 
the obvious way: every time we choose two locations i,j for a proposed swap in C n , we choose the 
same pair of locations i,j for a proposed swap in G. Starting from identical labelings at time 0, 
differences in the two labelings will emerge whenever the coupling proposes a swap between two 
locations that are both unhappy in one graph but not the other. To bound the rate at which such 
differences develop, we define the set of tainted nodes at time t, D(t), by the following inductive 
definition. At t = the set of tainted nodes consists of all the nodes i belonging to the congruence 
classes — w + 1, — w + 2, . . . , w — 2, w — 1 mod L. If the two nodes i, j that are chosen for a proposed 
swap at time t > are both untainted, then D(t) = D(t — 1). Otherwise D(t) is the union of 
D(t — 1) with the set of all nodes that are u>-step neighbors of either i or j in either G or C n . Note 
that 

\D(t) - D(t - l)\ <6w (6) 

since the number of to-step neighbors of any one node is 2w in each graph, and at least w of them 
are u>-step neighbors in both graphs. 
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Lemma A. 5. At any time t in the coupling of the segregation processes on G and C n , if node i is 
untainted, then i and all of its w-step neighbors have the same label in both graphs. 

Proof. The proof is an easy induction on t. When t = this follows from the fact that both graphs 
have the same initial labeling, and a node has the same w-step neighbor set in both graphs unless 
it belongs to -D(O). When t > 0, the only nodes whose w-step neighborhood may experience a label 
change are those located within w steps of the nodes i,j that were selected for the proposed swap. 
If i or j is tainted at time t — 1 then all such nodes become tainted at time t. If both i and j are 
untainted at time t — 1, then the induction hypothesis implies that the proposed swap affects the 
labeling of both graphs in the same way. □ 

Lemma A. 6. The expected number of tainted nodes at time t > is bounded above by e 12wt > n I 7/~T ) n. 
The probability that l -^ > 2e l2wt ' n (ttt) ™ at most exp ( 



Proof. Let i,j be the pair of locations that are chosen for a proposed swap at time t. Let d(t) = 
\D(t)\. By the union bound, the probability that either i or j belongs to D(t) is at most 2d(t)/n. If 
so, the w-step neighbors of i and j in G and C n are added to D(n+1). We have seen in Equation ^ 
above that d(t + 1) — d(t) < 6w in that case, and otherwise d(t + 1) = d(t). Therefore, 

E[d(t + 1) | d(t)} < d(t) + 6w ■ (2d(t)/n) = d(t) ■ (l + —} < d(t) ■ e l2w ' n . 

This shows that the sequence of random variables Yt = d(t)e~ 12wt ' n is a supermartingale. The 
first statement of the lemma follows immediately from this fact, together with the initial condition 
Yq = I T7— y ) n. We prove the second statement using Azuma's Inequality, which means we first 
need an upper bound of the form \Yt — V*+i| < cj almost surely. The calculation 

\Y t - e~ 12w / n Y t \ + \Y t+1 - e- 12w / n Y t \ < 12w(Y t /n) + 6we- 12w ^ t+1 ^ n < 18we- 12wt ^ n 

justifies setting c t = 18we~ 12wt / n . Note that ESo c t = (18w) 2 (l -e" 2410 /")- 1 < 18wn, from which 
the second statement of the lemma follows. □ 

Bounding the asymmetry of the process The segregation process on a disjoint union of 
rings of length L(w) can be analyzed using Wormald's differential equation technique, a method 
that applies to families of vector-valued discrete-time stochastic processes indexed by an integer 
n, whose behavior converges to a solution of a continuous-time differential equation as n tends to 
infinity. We can representing a state of the segregation process on G by a 2 ( w > dimensional vector 
£ whose components are indexed by strings a G {x,o\ ( w > . The component Co- is defined to be the 
number of nodes i such that the ring containing i is labeled (say, in clockwise order starting from 
i) with the string a. 

It is not difficult to work out a formula for the conditional expectation E[£o-(t + 1) | £(£)]. We 
simply have to enumerate all of the ways that the number of rings with labeling a could increase 
or decrease in a single step, and work out their probabilities. For any j G {1, . . . ,L(w)} and 
a, a', a" £ {x,o} L ( w >, consider a proposed swap between the j th node of a ring R\ whose labeling 
(starting from the first node) is a', and the first node of a ring R2 whose labeling is a". Define 
a coefficient a = a(j,o~,o~',o~") as follows. If a 7^ a' and the proposed swap results in R± having 
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labeling a instead of a', then a = 1. If a = a' and the proposed swap results in R\ having an 
labeling other than a, then a = —1. Otherwise a = 0. Having made these definitions, we can derive 
the formula 

E[Ut + l)-Ut)\C(t) = C]=2^Yl^^,a\a'')^ + 0^). (7) 

3=1 a 1 , a" " \ n J 

The first term accounts for the expected net change in the number of instances of a when going 
from step t to step t + 1. (The factor of 2 is because either of the two locations selected for the 
proposed swap at time t could be the one that belongs to the ring R\.) None of the terms in the 
sum account for the probability that both selected locations belong to the same ring; this event 
has probability L(w)/n, and it accounts for the 0(L(w)/n) correction term at the end of ([7]). 

We are now in a position to apply Theorem 5.1 of [52], using the quadratic function f(C/n) 
obtained by removing the 0(L(w)/n) term from the right side of (JTl). That theorem has three 
hypotheses: a boundedness hypothesis that is trivially satisfied because | Co- (i + 1) — Co- (t) | < 2L(w), 
a trend hypothesis that corresponds to Equation Q, with our error term 0(L(w)/n) playing the 
role of the term that Wormald denotes by Ai, and a Lipschitz hypothesis that is satisfied because 
we have explicitly written our function / as a sum of a bounded (i.e. independent of n) number 
of quadratic functions whose coefficients are independent of n as well. Wormald's Theorem has 
two conclusions. The first asserts the existence of a unique solution to the differential equation: 
for z in the open set D = {C, | < Co < 1 Vo - } the system of differential equations 4| = f(z) 
has a unique solution in D satisfying the initial condition z(0) = C(0)/n and extending to points 
arbitrarily close to the boundary of D. The second, more important, conclusion is that for a 
sufficiently large constant C, with probability 1 — 0(n 1 ' 4 exp(— n 1 ' 4 )), Co(0 = nz a (t/n) + 0(n 3 ' 4 ) 
for all t < kti, provided that the constant k is such that the differential equation solution z satisfies 
mm a mmo< x <K{zcr(x)} > Cn~ l / A . 

We will now apply Wormald's theorem, to prove that the numbers of unhappy x's and o's remain 
nearly balanced with high probability, by exploiting symmetry properties of the initial condition 
and of the differential equation itself. For a string a £ {x, o} ^ w ' let a denote the string obtained 
from a by replacing every occurrence of x with o and vice- versa. For a 2 L ^ W > -dimensional vector 
z, let l{z) denote the vector whose a component is z& for all a. It is straightforward to verify 
that the function / defining our differential equation satisfies t(f(z)) = f{i{z)) for all z, either by 
analyzing the formula y\ defining /, or by directly appealing to the symmetry of the rules defining 
the segregation process. Consequently, the fixed-point set of i is invariant under the differential 
equation: if z(0) is a fixed point of i then so is z(x). 

The only remaining difficulty is that our discrete-time process (as opposed to its continuum 
limit) does not start from an initial state vector that is literally invariant under the mapping l; it 
is only approximately invariant. The following paragraphs quantify the error in this approximation 
and its contribution to the overall error term in our application of Wormald's Theorem. 

Define a function u(a) by specifying that u(a) = 1 if a ring with labeling a contains an unhappy 
x in its first node, u(a) = — 1 if the first node is an unhappy o, and u{a) = otherwise. Consider 
the linear function 

a 

Note that A(C(t)) is simply the difference between the fraction of nodes containing unhappy indi- 
viduals of type x and o at time t, and that A = on the fixed-point set of i. For constants c, k > 0, 
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let S(c, re) denote the set of vectors z such that the differential equation solution with z(0) = z sat- 
isfies |A(z(x))| < c for all x G [0, re]. By the continuity properties of ordinary differential equations 
(e.g., pages 135-136 of [Hj) it follows that 5(c, re) is an open set. Furthermore, it contains the set 
E = {z | i{z) = z, \/a Zff > 0, X^o- z °" = !}• Since E is compact, it follows that there is a constant 
5(c, re) such that every point within distance 5(c, re) of E belongs to 5(c, re). 

The expected squared distance from £(0)/n to E is easy to bound from above, because C(0)/n — 
t(£(0))/n is a sum of n/L(w) independent random vectors (corresponding to the n/L{w) rings), 
each having mean and expected squared length at most (L(w)/n) 2 . Consequently, by Markov's 
inequality, the probability that the squared distance of £(0)/n from £ exceeds 5 2 = (S(c, re)) 2 is at 
most ^. 

Combining all of these arguments, we have the following result. 

Theorem 3. Consider the segregation process on a ring of length n with window size w. Suppose 
c = c(w) and re = n(w) are positive numbers such that c > 2e 12wK I jt-\ ) ■ Then there is a positive 



3 

bliUjll L — 

every time < t < rera, the numbers of unhappy x's and o's differ by at most 3cn. 



constant 7 = 7(0, re) such that for all sufficiently large n, with probability greater than 1 —, at 



Proof. Set the constants c, re as in the above discussion, and set 7 = 5 2 / L(w) where 5 = 5(c, k) 
is defined above. There are three possible reasons that the numbers of unhappy x's and o's could 
differ by more than 3cn at some time < t < nn. 

1. The distance of the initial vector £(0)/n from the set E is greater than S(c, k). 
We have seen above that the probability of this event is at most — . 

2. The vector £(0)/n is within distance 5(c, k) of E, yet at some time < t < nn, the numbers 
of unhappy x's and o's differ by more than 2cn in the segregation process on G. 

The assumption that C(0)/ n is within distance 5(c, k) of E implies that it belongs to S(c, k) 
and hence that the differential equation solution satisfies |A(z(x))| < c for all x G [0, re], and 
in particular this holds when x = t/n. Our assumption about the imbalance between unhappy 
x's and o's at time t implies that |A(£(i))/n| > 2c and hence that \A(£(t)/n — z(t/n))\ > c. 
Using the formula defining A, and recalling that |u(<r)| < 1 for all a, we now see that 

^ I Ccr (*) - nz a {t/n)\ > en 
c 

hence there exists a such that Ccr(t) — nz a (t/n) > jj^s- For sufficiently large n this exceeds 

the 0(n 3/ ) error term in Wormald's theorem, hence the probability of this case occurring is 
0(n 1 ' 4 exp(— n 1 ' 4 )), which is less than — for sufficiently large n. 

3. At some time < t < nn, the numbers of unhappy x's and o's differ by at most 2cn in the 
segregation process on G, yet they differ by more than 3cn in the segregation process on C n . 
In this case, the number of tainted nodes at time t must be greater than cn. Therefore, the 
probability that this event occurs at any particular time t is at most exp ( — 36 ^/\ 2 J . Taking 
the union bound over all t in the range 0, . . . , ren, we find that the probability of this case is 
bounded above by «;nexp ( — 36 2fa,-)2 ) ; which is again less than — for sufficiently large n. 

Combining the upper bounds for the probabilities of these three bad events using the union bound, 
we obtain the probability estimate stated in the theorem. □ 
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In order to obtain Theorem^] from Theorem^it is necessary to specify values for L(w),c(w), k(w). 
We define 

c(w) = \w~ A (8) 

k(w) = 2w 2 ln(u>) (9) 

L(w) = 12e 12wK ^ (10) 

These choices are justified by the following proposition, which completes the proof of Theorem [2} 



Proposition A. 7. Define c(w),k(w),L(w) as in Q-flO) above. LetTo be the earliest time at which 
fewer than 3n/w 2 individuals are impatient and let T\ be the earliest time at which the numbers of 
unhappy x 's and o's differ by more than 3c(w)n = n/w 4 . The probability that Tq < n(w)n < T\ is 
at least 1 — 1/w, for all sufficiently large n. 

Proof. Define a pending node to be a node that has not yet reached its satisfaction time. Let 
4>x(t),(f>o(t) denote the fraction of pending nodes at time t that are labeled with x,o, respectively, 
and let (j>(t) = 4> x (t) + <t>o(t)- Similarly, let ip x (t),ip (t) denote the fraction of unhappy nodes at 
time t that are labeled with x,o, respectively, and let ip(t) = ip x (t) + ip (t). 

Note that <p x {t + 1) = <f> x (t) unless the proposed swap at time t involves a pending x and an 

unhappy o, in which case <f> x (t + 1) = 4> x (t) since exactly one x is no longer pending. The 

probability of proposing a swap between a pending x and an unhappy o at time t is 2(f) x (t)vp (t). 
Hence 

E[<p x {t + 1) - Mt) I Mt), Mt)} = -%Mt)M*)- (ii) 

Similarly, 

E[0 o (t + 1) - <t> (t) | Mt),Mt)] = -fM t )Mt)- (12) 

Using (/>*(£), V'*^) as shorthand for the quadruple of random variables (4> x (t),4> (t),ip x (t),^p (t)) 
and summing, we obtain 

B[0(t + i)-^t)\Mt),Mt)} = -HMt)Mt) + MtHx(t)). (13) 



We can obtain an upper bound on the right side of (13), provided that t < min{To,Ti}. Indeed, 
for any such t we have ip(t) > 3/w 2 (since there are at least 3n/w 2 impatient individuals at time t 
and all of them are unhappy) , whereas 

\Mt) - Mt)\ < i/w 4 < i> 2 < ^(t)/3. 

Since tp x (t) + ip (t) = ip(t), it follows that min{^ x (i),^ (t)} > ip(t)/3 and consequently, 

E[#t + i)-0(t) | Mt),Mt)} < -UMt) + Mt))^ < -&i<Kt), (14) 

where the last inequality used the facts that ip(t) > 3/w 2 and 4>{t) = <p x (t) + 4> {t). Rearranging 



terms in (14) we obtain the bound 

EW + i) | Mt),Mt)] < (i- A) W) < e~ 2/w2n <P(t), 

which implies that the sequence of random variables defined by 

_ f e 2t / w2n (j){t) if t < min{T , T x } 
1 Yt-i otherwise 
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is a supermartingale. The initial condition Yq < 1 now implies that for all t, E[Yt] < 1. 
Now we specialize to a fixed value of t, namely 

t = K,(w)n = 2w \a{w)n. 

If t < min{To,Ti} then the number of impatient nodes at time t is at least 3n/w 2 , hence (j)(t) > 
3/w 2 . This implies 

Y t = e 2t / w2n <f>(t) > w A ■ {3/w 2 ) = 3w 2 . 

Using Markov's inequality, 

Pr[K(w)n < min{T ,Ti}] < Pr[Y t > 3w 2 } < ^. 

Theorem^ says that for some constant ~f{w), 

Pr\n(w)n < Til > 1 - -A— 

But if K,(w)n < T\ and n{w)n -ft min{To,Ti}, it means that To < K,(w)n < T\. Thus, by the union 
bound, 

Pr[T < «Mn < Ti] > 1-^-3^- 

For sufficiently large n, the right side is greater than 1 — 1/w, as desired. D 
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